CQRS Overview - Chicago Alt.Net User Group January Meeting Recap

(Each month I plan to attend technical user groups in the Chicago area to (re-)learn from peers experiences with new and existing technologies and to network with like-minded techies. This blog is one in a series of recaps of some of the more interesting aspects of the meetings for my own purposes (this is a “web log” afterall) and for others to get a general taste of what’s available in the Chicago user group scene.)

UserGroup: Chicago Alt.Net

Location: Willis (aka Sears) Tower


Meeting Date: Wednesday, January 13, 2010


If you're in the Chicagoland area and do any .Net programming, I recommend this group.  It's got a decent turnout (usually). I've been to three meetings with anywhere from 20 - 40 people in attendance.  And they have good giveaways if you stay till the end. Today was especially good: A copy of Windows7 Ultimate, Office 2007, the JetBrain's product of your choice (ReSharper, dotTrace, TeamCity, IntelleJ -- great products if you're not familiar), $50 Barnes & Noble card, and of course a few Microsoft tech books. And free pizza!


Presenter:John Nuechterlein (aka Jdn) (http://www.blogcoward.com).

Topic: CQRS in roughly an hour or so

(CQRS == "Command Query Responsibility Segregation")


So, I'll be honest: Going into this presentation, I had only a very vague idea of what CQRS is and even less idea why I should care. (I had actually planned to play poker that night instead, but poker got canceled L).  Leaving this presentation, my mind was swirling with all of the projects where CQRS would have been great to use (and the many parallels with the "massively" distributed computing environment we developed at Wayport.  -- where "massive" is defined in pre-Google, pre-Facebook terms).

So, what is CQRS?  Frankly, I couldn't do it justice in this blog -- but I needn’t try, since my smarter, more eloquent people have done so before me. Actually, from what I understand, CQRS got its start in the blogosphere. For more depth, here is one key link (
http://elegantcode.com/2009/11/11/cqrs-la-greg-young/), but here's my lame attempt at a quick definition:


CQRS is a system architecture/design pattern that separates the act of reading data (query) from taking action (command) in order to produce a system which easily scales and provides some useful benefits (such as "playable" event logs) that make the maintenance of the system less burdensome. (For my purposes, I'm going to designate CQRS as an "architecture", partly because I don't want to write "architecture/design pattern" anymore.)



In my mind, CQRS lends itself pretty well to web-based systems, and SOA/SaaS at that, although it could be applied elsewhere.  I'm probably jumping ahead of myself a bit, but that statement is useful for backing this next statement. To consider using CQRS, you must buy into a fundamental assertion: your data is always stale.  How stale depends on your system, but the fact that my last blog entry was about data caching techniques just goes to show that particularly in web-based system, we intentionally make some of our data stale by putting it into a cache. 



But even without intentional caching, your data is stale.  Consider this: you have an eCommere site (say, amazon.com). Your end user pulls up the product page for the Widget2010 product, which includes price and quantity in stock information. In the 60 seconds it takes the user to read the page and click the shopping cart button, five other people have ordered their own Widget2010, so the quantity in stock is now less than what's on the first user's screen -- thus, the data is stale.  You know this -- you expect this -- and you've already coded a dozen ways to deal with this, so accepting that "your data is always stale" is really not that big a leap of faith.

So why is that important?  Because the CQRS architecture separates the reading of data and the acting upon data into two separate logical areas, where the reading area has a data cache which is stale.  Now, it may only be 1 ms stale, but stale none the less.

So let's dive in to my high level summary of Jdn's high level overview. (note: if you want to bypass my bias and possibly complete misinterpretation of the presentation, it is (or will soon be) available on the http://chicagoalt.net website in video form):

CQRS has four logic areas in your system design, as pictured in this drawing stolen squarely from the blog linked above:


Queries
This is the "reporting" piece -- or said another way, this is the read-only interface into your data. Basically, it's a lightweight data layer reading from your data store and proving DTOs back to your UI. It does NOT go through your Domain Model.  It is simply a read-only view into your database.  Your database, however, is really a cache of your data, and likely a local cache.  In other words, since the data is just a cache of the "real" data, why not push it out as close to the UI as possible to minimize the latency to your UI (since the majority of your UI's interactions with your data is reading, making this link super-fast will in turn make your system faster). Since these are read-only views into your data, the DTOs are extremely simple and can/should be specific to the consumer (ie: no "product" DTO, but rather a productForOrderPage which only has the data needed by the order page) reducing the amount of data getting dragged around between layers, etc.  Why pull your full product from the db if you only need 20% of the information for the current page? Need to scale your UI? Just add another data store/cache -- it's practically a rubber stamp.

Commands
The next logical area of this architecture is the Command Bus/Command Handlers. This is where the next tenant of CQRS comes into play: system actions (aka "commands") have specific intent.  If you've practiced Agile Development and you're familiar with the concept of the User Story -- basically requirements as documented from the viewpoint of a user's interaction with the system -- then this may be easy to grasp.  Basically, Commands are actions taken on your Domain objects.  For example: CustomerChangedAddressCommand or AddProductToShoppingCartCommand.

Commands are the only way to update data in your system and are seen as atomic actions which can be wholly accepted or rejected. Commands are generated by the UI and pushed onto the Command Bus and picked up by Command Handlers.  Command Handlers in turn pass those commands into the Domain.  Since commands go onto a bus, they can be queued, prioritized, etc, just like any message on a messaging bus, thus allowing you to ensure your most important commands are handled appropriately (another scalability knob you can adjust to meet your performance needs).

Now, since commands require specific intent (ie: user updates their address/places item into shopping cart), this does have implications for your UI -- specifically, you can't have "Excel-like screens".  There's no "update everything about this user and here's all the data" command (or there shouldn't be), so if that's what you're looking for, you may want to look elsewhere.  But honestly, this may not be a bad thing, as it forces some system designs which will likely result in more user friendly, reliable systems in the long run.

Internal Events / Domain
This area is the authoritative knowledge source.  Your business logic resides here. Your Domain objects reside here.  Here is where you'll find the Event Store. This is the most complicated part of the system, and is where the presenter struggled at times to explain some concepts / answer some questions, so this is where you'll likely want to ensure you've done your homework. (To his credit, Jdn fully admitted up front he did not know all of the ins-and-outs of this area and did his best to explain).

As commands come into your Domain from the Command Handlers, your Domain objects validate business rules to determine if the command is valid for the current state of the world and either reject the command en-whole, or, in one atomic action, update the state of the world according to the command (thus, an “event” occurs). 


Events are written to the Event Store, which is persisted, likely in an RDBMS, ODBMS, etc.  A snapshot of the state of the world is taken periodically and if you want to recreate a place-in-time, just take the previous snapshot and replay the events in order from that snapshot until the place-in-time you care about. Your domain could just remain in memory, if you'd like -- otherwise you'd pull the most recently stored (via snapshot) version of your domain object(s) and replay any new events against that object until it's fully restored, then execute your new command and presto-chango!

Now this was of particular interest to me from a troubleshooting/audit standpoint.  If a problem occurs on the site, pick a snapshot from right before the problem occurred and replay your events to reproduce the issue. (QA engineers applaud here.)  New version deployment go ghastly wrong, rollback the events until right before the deployment.  Theoretically, you could even re-run the events again against the older version of the software and (unless the events weren't supported in that version) recover the data changes.  Try doing that when your domain is specific to your db schema or your audit history is at the db table level (how do you replay an "insert" when the columns have changed?)

One note here is that your Event Store is "write only" -- meaning you don't ever delete things from your domain, you just adjust them.  The presenter used the analogy of an accountant ("accountants don't use erasers").  If an accountant finds an error in the ledgers, they don't edit that line item -- instead they create an adjustment line item to offset the difference.  The Event Store is similar -- you create events to adjust/negate/otherwise manipulate your data.

External Events / Publication
Now, to go full circle, the Domain / Event handling system will publish any events that it handles (but not those that it rejects).  Any data stores/caches will subscribe to that feed, and will update themselves based on those events. Thus, your data cache is only as stale as it takes to process the published events.  Again, you use a message bus here (or a webservice, etc) and use prioritization queues if you wish to enhance performance/scalability.

This is where the concept of "Eventual Consistency" is used -- that is, your Domain and your Data Stores will eventually sync up, just not necessarily in "real time" -- but we're ok with that because we've agreed that latency is almost always ok and availability trumps correctness in the Data Stores -- because "your data is always stale" anyway.

And the big finish...
To conclude, the presenter touched on reasons why you wouldn't want to use CQRS, including:

·         it's new, it's different
·         multiple data stores (maybe you don't want this)
·         operational complexity
·         lots of commands, events, handles, etc

…and my conclusion
All-in-all, this is definitely something I will consider for large systems in the future, although it is likely overkill for most systems in the market I serve.  I suggest doing some Google searches if you're interested at all in learning more (seems to be a good deal of data, videos, webinars, blogs, etc. out there).